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Abstract —Cross-correlation is a popular signal processing technique used in numerous location tracking systems for obtaining reliable 
range information. However, its efficient design and practical implementation has not yet been achieved on mote platforms that are 
typical in wireless sensor network due to resource constrains. In this paper, we propose SparseS-XCorr: cross-correlation via 
structured sparse representation, a new computing framework for ranging based on minimization [i] and structured sparsity. The 
key idea is to compress the ranging signal samples on the mote by efficient random projections and transfer them to a central device; 
where a convex optimization process estimates the range by exploiting the sparse signal structure in the proposed correlation 
dictionary. Through theoretical validation, extensive empirical studies and experiments on an end-to-end acoustic ranging system 
implemented on resource limited off-the-shelf sensor nodes, we show that the proposed framework can achieve up to two orders of 
magnitude better performance compared to other approaches such as working on DCT domain and downsampling. Compared to the 
standard cross-correlation, it is able to obtain range estimates with a bias of 2-6 cm with 30% and approximately 100 cm with 5% 
compressed measurements. Its structured sparsity model is able to improve the ranging accuracy by 40% under challenging recovery 
conditions (such as high compression factor and low signal-to-noise ratio) by overcoming limitations due to dictionary coherence. 

Index Terms —Ranging, Location Sensing, Positioning, Cross-Correlation, Sparse Approximation, Compressed Sensing, 
^^-Minimization, Structured Sparsity 
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1 Introduction 

Location sensing is a vital enabling technology for numer¬ 
ous applications in the field of binaural science, acoustic 
source detection, target motion analysis, sensor networking, 
mobile robot navigation, mobile computing, etc. While GPS 
remains to be the de facto solution for outdoor positioning, 
its limitation to service GPS denied environments (such as 
indoor and obstructed outdoor) makes location estimation 
- still - a fundamental problem. Localization is a two step 
process. The first step is to measure the separation distance 
(or range) of the unknown entity (that needs to be localized) 
from at least three positioned entities (or known locations). 
These measurements are subsequently utilized in the sec¬ 
ond step that multilaterates the position estimate using a 
spatially constrained optimization framework. This strong 
dependency of the reliability of positioning accuracy on the 
distance measurement makes ranging a crucial prerequisite 
for localization. 

Challenges. Acoustic and radio ranging technologies have 
matured significantly in the last few decades. It is now well 
understood that highly accurate results can be achieved by 
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measuring the travel tim^ of the ranging signal. However, 
the resources required for signal detection are a deciding 
factor for the cost, size and weight of the sensing platform; 
and this essentially strikes a trade-off between (localization 
accuracy, coverage range) and energy efficiency. Low-cost 
and low-power systems estimate the arrival time of the 
pulse by utilizing simple detection schemes (such as empiri¬ 
cal thresholding of the leading pulse edge (2|). Nevertheless, 
they turn out to be less reliable due to their limited com¬ 
putational capability to counter environmental noise and 
multipath reflections j^. An established methodology to 
overcome these limitations is to broaden the range of signal 
frequencies and distribute the energy between the various 
multiple paths; and subsequently apply a matched filter at the 
receiver end to count the elapsed time samples by resolving 
those multiple propagation paths. Its benefits are two fold 
as broadband signals reduce the chance of the entire signal 
fading at any particular time, while matched filters allow 
for their processing and form a strong pulse at the line- 
of-sight (LoS) path by increasing the overall signal-to-noise 
ratio (SNR) without using excess transmission power. 

There are numerous in-air and underwater ranging sys¬ 
tems that have widely used these techniques to 

deliver remarkable (accuracy vs. range) performance, but 
at the expense of specialized computing platforms (such as 
DSP processors) that are both power intensive and costly. 
Such stringent needs pose a major challenge to the field 
of wireless sensor networks (WSN) that aim to achieve 

1. Travel time is interchangeably referred to as: time-of-£light (ToF), 
time-of-arrival (ToA), propagation delay, or time delay in the rangefind¬ 
ing literature. 
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similar functional capability on constrained devices with 
high restrictions on data sensing rates, link bandwidth, 
computational speed, battery life and memory capacity (less 
than 50 kB of code memory and 10 kB RAM) 0 , (T0| . This 
has been the primal factor that has greatly limited the 
realisation of sophisticated algorithms (such as the matched 
filter). This problem can be simplified by designing a light¬ 
weight signal detection and post-processing mechanism that 
not only serves the purpose of sample counting, but is also 
suitable for running on constrained embedded platforms 
typically used in WSNs. Motivated by the need to design 
such a mechanism, we propose Struct-Sparse-XCorr. 

Contributions. Struct-Sparse-XCorr (or StructS-XCorr): cross¬ 
correlation via structured sparse representation is a 
new computing framework for ranging based on - 
minimization Q and structured sparsity It is based on a 
mechanism to compress and transmit the condensed rang¬ 
ing data to a more resourceful offloading device (or base- 
station), wherein the time delay of the ranging signal can 
be efficiently recovered to determine the range. Cross¬ 
correlation is the conventional method of obtaining this pa¬ 
rameter; but, given its sparse information content and struc¬ 
ture, we make use of the theoretical results in structured 
sparse approximation to achieve a similar performance. The 
underlying information theory suggests that a signal can be 
recovered by - minimization Q, when its representation is 
sufficiently sparse with respect to an over-complete dictionary 
of base elements. The recovery model (or the optimization 
framework that bear resemblance to Lasso in statistics | pd] |, 
p^), instead of penalizing the number of nonzero coeffi¬ 
cients directly (e.g., ^^-norm) [ [l^ |, penalizes the - norm of 
the sparse coefficients in the linear combination. 

We propose a new dictionary that combines the infor¬ 
mation sparsity along the time-delay search dimension, 
and achieves up to two order of magnitude better sparse 
representation and performance compared to standard ap¬ 
proaches such as working on DCT domain and down- 
sampling. StructS-XCorr overcomes ranging inaccuracies 
induced by dictionary coherence by approximately 40% 
for signals subjected to high compression factor and/or 
received with low SNR levels. 

We empirically validate our hypothesis in real-world in¬ 
door and outdoor setups. With respect to cross-correlation, 
we show that StructS-XCorr obtains range estimates with a 
relative error of less than 2 cm by using 30% compressed 
measurements, and approximately 60 cm relative error with 
5% measurements only. We also address the problems of 
slower compression speed and incorrect peak identification 
(important for estimating range) by devising a divide-and- 
conquer method. 

We present the design and implementation of an end- 
to-end acoustic ranging system consisting of Tmote Invent 
(receiver) nodes and a custom built audio (transmitter) 
node. The results show a relative ranging and 2D position 
error of less than 4 cm over cross-correlation using 30- 
40% compressed measurements, but with significant energy 
savings of an order of magnitude two. 

To support our contributions; we present the design of 
SparseS-Xcorr and its emprirical studies in the next section, 
which is then followed by the description of the acoustic 


ranging system and its evaluation in Section]^ Finally, we 
survey related work in Section]^ and summarize the paper 
with concluding remarks in Section]^ 

2 The Design of StructS-XCorr 

To ground our discussion, in this section, we first present 
the details of time-based ranging using cross-correlation and 
structured sparse approximation. We build on these learning 
to cast the ranging problem into the new computation 
framework of StructS-XCorr, and then follow it up with its 
empirical analysis. 

2.1 An Overview of Time-based Ranging 

Previous studies have shown the most successful techniques 
for estimating the precise distance between two devices are 
based on measuring the travel time of the signal propagation 
between them (14) . The reliability of this measurement 
depends on many factors, of which robustness of examining 
and estimating the energy of the received signal is one of 
them. In this regard, matched filter is the state-of-the-art in 
detection technology. 

A matched filter is implemented by cross-correlating the 
received signal x{t) with the transmitted signal replica p{t). 
Cross-correlation (X-Corr) of p(t) and x{t) is a sequence s{r) 
defined as: 

nt —-|-CXD 

s{t)= p{t + T)x{t) (1) 

J t=—oo 

where the index of r G M is the time shift (or lag) parameter. 
This operation s(r) results in correlation peaks where the 
position of the peaks provides a measure of the arrival time 
of the different multipaths. The index of the first tallest 
correlation peak is the estimate of the pulse arrival time 
of the LOS path, which is a direct measure of the range. 
Generally, x{t) is acquired for a (finite) minimum time t = ta 
given by: 

% > ^ ^ 

where, dc is the channel length between the transmitter 
and the receiver, Vg is the speed of the ranging signal in 
the medium, tp is the time-period of the transmitted signal 
p{t), and tr is the approximate reverberation time within 
which the echoes from the transmitted pulse should have 
fallen below an acceptable level before the next pulse is 
emitted. The corresponding discrete-time signal of p{t) and 
x{t) obtained at a sampling rate (hertz/samples per second) 
of Fg is given as: p[np] = p[tpFg] and x[na] = xfaFg] 
0 < rip, ria < oo. Therefore, p(t) and x{t) can be represented 
as vectors p G 'MFp and x G The time delay is obtained 
by finding: 

r = argmax |s(r)p. (3) 

T 

Road-map. The computing operation of r (Eq. is ex¬ 
pensive, and demands high memory, computation and en¬ 
ergy resources. Considering the constraints of typical WSN 
platforms, it is desirable to scale down its complexity by 
a simpler process while still being capable of precisely 
estimating f. This motivates the scope for a new framework. 

Fig. I^a) shows a received signal trace recorded for a 
duration of 0.1 s sampled at 48 kHz, and its cross-correlation 
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Time based Representation 



Cross-correiation based Representation 



(a) (b) 

Fig. 1: Range estimation by cross-correlation. The infor¬ 
mation content is sparse as the time delay value corresponding 
to the correlation peak is only useful. It also depicts the same 
received waveform in two different (time and cross-correlation) 
representations. Note that in the figure: LOS stands for line-of- 
sight and MP is expanded as multipath. 


with the reference copy (a linear chirp of 1-20 kHz/0.01 s) is 
depicted in Fig. [^b). Ideally, only a single dominant peak 
should be observed at the correct time shift; however, due 
to signal and noise interference, peaks of smaller magnitude 
may also coexist. Fig. [^b) exactly reiterates this principle, 
where the correlation peak is the only useful information, 
and is representative of the signal's time delay. Therefore, 
our idea is to exploit the underlying information sparsity 
in the signal model to design a simpler acquisition scheme 
that supports efficient compression, and later recovery. In 
other words, our problem statement is to: obtain the cross¬ 
correlation result (unknown) s using significantly fewer (known) 
observations of x based on the sparsity sturecture of the problem. 
In the next section, we discuss the theory of sparse approxi¬ 
mation and structured sparsity that can exploit this feature. 


2.2 Sparse Approximation and Structured Sparsity 

Motivation insight. One can accurately and efficiently re¬ 
cover the information of a high dimensional signal (as x) 
from only a small number of compressed measurements, 
when the signal-of-interest is sufficiently sparse in a certain 
transform domain (e.g. p5)). 

The rationale of ^^-minimization. Using the sparsifying 
domain, referred to as a dictionary G (with full 

rank), any discrete time signal x E 'MT can be represented 
as a linear combination of columns of as: 

d 

X = 4'S = ^ Siipi (4) 

i=l 

where s G is a coefficient vector of x in the domain, 
and fji is a column of If s is sparse enough, then the 
solution to an underdetermined system of the form x = 
(where the number of unknowns d is greater than the 
number of observations n) can be solved using the following 
^^-minimization problem, where the ^^-"norm" counts the 
number of nonzero entries in a vector. 

: So = argmin ||s||o subject to: x = (5) 

However, this problem of finding the sparsest solution 
(^^-minimization) of an underdetermined system of linear 
equations is NP-hard p3|. As an alternative, Candes et al. 
in and Donoho in |17| show that if s is sparse enough. 


and satisfies the Restricted Isometry Property (RIP), then 
the ^^-minimization problem (Eq. has the same sparse 
solution as the following minimization problem that 
can be solved in polynomial time by linear programming 
methods. 

{£^) : Si = arg min ||s||i subject to: x = (6) 

However, due to noise (white Gaussian) v G present 
in real data, x may not be exactly expressed as a sparse 
superposition of s, and so, Eq. needs to be modified to: 

X = + V (7) 

where v is bounded by ||v II 2 < e. The sparse s can still be 
recovered accurately by solving the following stable min¬ 
imization problem via the second-order cone programming. 

{il) : Si = argmin ||s||i subject to: ||T^s — x ||2 < e (8) 

It is important to note that RIP is only a sufficient but 
not a necessary condition. Therefore, ^^-minimization may 
still be able to recover the sparse s accurately, even if the 
sensing matrix does not satisfy RIP. In fact, the use of 
^^-minimization to find sparse solutions has a rich history. 
It was first proposed by Logan | [T8) , and later developed 
in Q, |[^|-(24). Here, we use ^ 1 -minimization to solve the 
cross-correlation problem via sparse representation. 

Dimensionality reduction by random linear projections. 

As shown in (25) by the Johnson-Lindenstrauss Lemma, 
the P distance is preserved in the projection domain with 
high probability by random projections. In other words, 
all the useful information is preserved in the projection 
domain. Hence, ^^-minimization can still be used to recover 
the sparse s from the projected measurements with an 
overwhelming probability, even though its dimension is 
significantly reduced. More precisely, this projection from 
high to low dimensional space can be obtained by using a 
random sensing matrix T> G as: 

y = T>x = T>(T^s) (9) 

where m n and y G is the measurement vector. 
In practice, if s has k d nonzero coefficients, then the 
number of measurements is usually chosen to be (2^ : 

m>2k\og{d/m) (10) 

The sparsity level of s can be verified if the reordered 
entries of its coefficients decay like the power law; i.e., if 
s is arranged in the decreasing order of magnitude, then 
the d^^ largest entry obeys < Const • d~^ for r > 1. 

For sparse s, the ^^-norm error between its sparsest and ap¬ 
proximated solution also obeys a power law, which means 
that a more accurate approximation can be obtained with 
the sparsest s. However, for efficient recovery, the columns 
of should be as independent as possible so that the 
information regarding each coefficient of s is contributed 
by a different direction; and this is achievable if T> and are 
more incoherent. Ensembles of random matrices sampled 
independently and identically (i.i.d.) from Gaussian and ±1 
Bernoulli distributions are largely incoherent with any fixed 
dictionary, and hence, permit computationally tractable re¬ 
covery of s g, 

Sparse approximation with structured sparsity. The theory 
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of sparse approximation is applicable to a sensing problem 
if the underlying signal can be sparsely represented in some 
dictionary A useful feature is that the dimensionality reduc¬ 
tion operation is completely independent of its recovery via 
^^-minimization. A sparse signal can be captured efficiently 
using a limited number of random measurements that is 
proportional to its information level. The ^^-minimization 
process does its best to correctly recover this information 
with the knowledge of only the dictionary that sparsely 
describes the signal of interest, when the noise power ||v ||2 is 
small enough and the dictionary is sufficiently incoherent. 

The mutual coherence of G denoted as 

/i(T^), is given as: 


m(^) 


max 

l<*<i<(2na-l) 




( 11 ) 


The worst-case coherence /i(^) corresponds to the largest 
absolute value of the inner product between two distinct 
dictionary elements, and is bounded as: 0 < < 1- 

While it has been proven that any designed is largely in¬ 
coherent with T>, it still may not be good enough for parameter 
estimation - especially under high noise conditions. There¬ 
fore, it is important to prevent coherent pairs of dictionary 
elements from appearing in the approximation process. 

Candes et al. (T^ Theorem 1.2], in fact, have shown that 
for conventional sparse approximation with coherent and 
redundant dictionaries, the reconstruction error is upper 
bounded by both the noise level and the best k-term ap¬ 
proximation error. In another recent work, Duarte and Bara- 
niuk Theorem 1] have examined that with a structured 
sparsity model (and using a greedy recovery approach), the 
upper bound of the reconstruction error decays exponen¬ 
tially to the noise level with an increase in the number of 
iterations. Thus motivated by the significant benefits of the 
structured sparsity model, we propose StructS-XCorr that is 
detailed in the next subsection. 


2.3 Details of StructS-XCorr 

In this section, we present the details of the new dictionary 
design followed by the computing model of StructS-XCorr. 

2.3.1 Design of Representation Dictionary 
Design guidelines. The general criteria for designing a 
reliable representation dictionary requires it to sufficiently 
sparsify the signal x. This one dimension search over the 
time delay space introduces an important design criteria; 
where, should be able to preserve the propagation chan¬ 
nel profile information while adhering to the basic design 
guidelines outlined by the underlying theory. We also define 
an additional criteria where should facilitate a faster 
recovery mechanism that implicitly derives the time delay 
result without reconstructing the original signal. Therefore, 
the design complexity is to identify and construct a befitting 
representation dictionary that satisfies all of the aforesaid 
requirements. 

Design intuition. To this end, we were guided by Eq. 
where the locally generated reference copy ensembles val¬ 
ues from a sweep over all possible (positive and negative) 
time delay values. This suggests that the received signal x 
could be sparsely represented by a single dimension space 


Signal Representation in Different Dictionaries 



Samples 

Fig. 2: Signal representation in different dictionaries. 

The signal has a more sparse representation in the correlation 
dictionary than its TFT and DCT counterparts by an order of 
magnitude exceeding 2. 


if we design a representation dictionary having column 
element that enumerate over all possible time delay com¬ 
binations. 

Design execution. For realizing this design goal, we adopt 
a positive and negative time-shifted Hankel matrix design 
of the transmitted signal vector p as Note that reversing 
the time-shifting order results in a Hankel matrix. We refer 
to this newly designed Tt as the correlation dictionary. 
Depending on the lengths of x and p, the following two 
categories can be identified. 

• Case-1 {ta = tp) : Vectors p and x are of equal dimensions 
with ria samples. The elements of G ]R^aX(2na-i) 
given as: 


i [zeros{na — i) p(l : i)]^ 1 < 

[p{i ^l-na'.ria) zeros{i - ria)]^ 

(ria + l) <i< {2na - 1) 

( 12 ) 

where ^(:,i) denotes the ith column, [•] denotes a vector 
of length ria, zeros{i) denotes a zero vector of length i, 
denotes the transpose of a vector (matrix), and p{i : j) 
denotes a vector of elements with indices from i to j of the 
input sample set p. 

• Case-2 {ta > tp) : The size of x is greater than p, and 
so, the system is balanced by right zero-padding {ria “ %) 
entries to p. 

Other popular dictionaries such as the EFT and DCT, 
in contrast, do not provide as good a sparse depiction as 
the proposed correlation dictionary, and also, do not satisfy 
the remaining two requirements (important for ranging). 
Fig-H compares their sparsity levels (for an indoor high 
multipath channel) by sorting the samples by their magni¬ 
tudes. The fastest decay characteristic (or the smallest k) is 
observed in the correlation domain, and so, offers the most 
sparse representation. This implies that the most accurate 
approximations (or range estimates) can be obtained with 
the correlation dictionary using the smallest number of 
measurements m (Eq. @). 
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XCorr: 1440 Samples with 480 Samples 



(a) Standard cross-correlation 


S-XCorr: 432 Samples with 480 Samples 



(b) Recovery via sparse approx. 


StructS-XCorr: 432 Samples with 480 Samples 



(c) Recovery via structured sparse approx. 


Fig. 3: Validation: StructS-XCorr vs. {S-XCorr^ XCorr}. The detection accuracy of StructS-XCorr is on a par with S-XCorr and 
XCorr, but with better robustness against multipaths and low-noise peaks. 


2.3.2 Compression and Recovery 

Compression. The dimensions of x G are significantly 
reduced at the receiver by multiplying it with a random 
sensing matrix T> G resulting in the measurement 

vector y G {m ria) as: y = T>x. m is related to ria 
by the compression factor a given as: m = ct where a G 
[0,1]. For example, a = 0.10 means that the information 
in X has been compressed by 90%. is a binary sensing 
matrix with its entries identically and independently (i.i.d.) 
sampled from a balanced symmetric Bernoulli distribution 
of ±1. 


T> = where i.i.d. Pr(T>i j 

Vm 


±1) = 0.5 


(13) 


Binary ensembles have a shorter memory representation 
than Gaussian ensembles, and also, alleviate operational 
complexity; hence, they are economical for sensor platforms. 
A balanced consists of ±1 at equal probability, where each 
row contains equal number of I's and -I's. Therefore, in each 
row of T>, the sum of the elements is always zero. A balanced 
T> provides a higher probability of detection (at recovery) 
if the noise in x is Gaussian pS) . The receiver transfers m 
samples of y to the base-station (BS) for post-processing. 

Recovery via Sparse Approximation (S-XCorr) The BS 
uploads the compressed measurements to a service applica¬ 
tion on the control server. It requires the a-priori knowledge 
of the seed that generates T>, and the dictionary Since x 
can be represented sparsely as s in the dictionary and x 
(the received signal) is known, the desired sparse solution 



> 


Fig. 4: System architecture of a custom designed acoustic rang¬ 
ing system for empirical characterization of StructS-XCorr. 


s can be recovered by solving Eq. However, as the 
dimensions of x are reduced significantly via Eq. {9^ the 
reduced £^- minimization problem for a given tolerance e is 
given as: 

{£l) : Si = arginin||s||,i s.t: ||i>^'s-y ||2 < e (14) 


is known as Lassc0in statistical literature, and regular- 
izes highly undetermined linear systems when the desired 
solution is sparse. The correlation domain coefficients si are 
related to the various propagation (direct and reflected) paths, 
where the index of the first tallest correlation coefficient peak is 
the estimate of the pulse arrival time of the direct path, and thus, 
provides the range. 

Recovery via Structured Sparsity and Sparse Approxima¬ 
tion (StructS-XCorr). In our case, Tt is not strictly incoherent 
due to the repetitive nature of the elements along each row 
of the matrix (which is an artifact of Hankel matrices). To 


Procedure 1 Structured SparseXcorr = 5/e(s, po) 


Input: Coefficient vector s, target coherence po 
Output: Structured sparse coefficient vector 


1 

2 

3 

4 

5 

6 

7 

8 


Initialize Sg = 0, i = 1 
while i < k and s % 0 do 

l* = argniaxi<,< 2 „-i \s{l)\ 
Ssil*) = s{n 

^ = wfufw - 

s|A = 0 

i = 

end while 


overcome the shortcomings due to dictionary coherence, we 
apply the principles of structured sparsity to the S-Xcorr 
computing model. Although the optimal solution can be ob¬ 
tained via linear programming, we adopt a computationally 
efficient heuristic after executing Eq. It is presented 
in Procedure [T] that works as follows. In line 3 and line 4 

2. Direct cross-correlation in the projection domain (using y) did not produce 
desirable ranging results because y consists of random projections. 

3. The minimizer of \\x — T^s ||2 + A||a;||i is defined as the Lasso solution; 

where A can be referred as the inverse of the Lagrange multiplier associated with 
a constraint ||a: — || 2 < £• For every A, there is an e such that the two problems 

have the same solution. 
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of the procedure, we select the entry s(/*) of the coefficient 
vector s with largest magnitude that is then pushed into the 
output of the structured sparse vector s. To prevent coherent 
elements from appearing simultaneously in we define 
(line 5) the set A of all indices A for which the inner product 
of and 'ipi* is larger than some predefined threshold fiQ. 
This set indices all dictionary elements that are coherent 
with the newly selected one, and their future selection is 
prevented in line 6 by setting the corresponding entries of 
the vector s to zero. 

2.4 Analysis of StructS-XCorr 

In this section, we analyze the performance of StructS-Xcorr 
and identify challenges in detection reliability 

Experimental system. We conducted this study using a cus¬ 
tom designed acoustic ranging system (Fig. with different 
assembled units. The front-end of the transmitter consisted 
of a COTS ribbon (speaker) transducer, but driven by an cus¬ 
tom assembled (external) wideband power amplifier with a 
tunable (5-20 times) gain controller. The receiver front-end 
comprised of a custom designed preamplifier unit interfaced 
with a COTS Knowles microphone (SPM0404UD5). The 
controller for this system was setup on a laptop, where 
synchronization and ranging signals were generated, cap¬ 
tured and analyzed for range estimation. The operational 
sequence commenced with the generation of the linear chirp 
[01-20] kHz/0.01 s that was then directed into two separate 
streams: first, left input channel of the ADC of the audio card 
(i.e., an electronic chirp) and second, wideband amplifier 
(i.e., an acoustic chirp). The electronic chirp is equivalent 
to an RF pulse and marks the transmission time of the 
acoustic chirp, which is thereafter detected by the receiver 
unit and directed into the right input channel of the ADC. 
The received acoustic signal is considered from the time 
marker provided by the electronic chirp so as to discard 
the delays incurred during the transmission stag^ At the 
processing station (that functionally replicates the receiver 
post-processing stage and BS), the acquired samples are 
first compressed and subsequently recovered to estimate the 
range. 

2.4.1 Ranging Challenges and Mitigation 

Analysis: basic ranging performance. In this experimental 
setup, the transmitter and the receiver were placed 1.5 m 
apart. The ranging process was performed with the receiver 
configured to record for 0.03 s - just long enough to capture 
the ranging signal along with its multipaths. The audio card 
was configured to sample at 48 kHz; hence, the transmitted 
signal p and the acquired trace x consisted of 480 and 1440 
samples respectively. Using a = 0.30, x was compressed to 
obtain the measurement vector y of 432 samples followed 
by its recovery to obtain s (Section [2.3.2| using S-XCorr and 
StructS-XCorr, and its accuracy is then validated against 
XCorr (Eq. [^. Fig. |^a). Fig. [^b) and Fig. [^c) show the 
respective results, where we observe that all the methods 
obtain exactly the same estimate for the position of the first 
tallest peak at a negative lag of 220 samples. S-XCorr is able 

4. The experimental setup mimics the concept of velocity-difference 
TDOA (V-TDOA) 


to obtain the multipath profil^ but it is not accurate as 
it does not follow the same height-to-position relationship 
(observe the position of peak-2 & 3 in Fig. [^b) as suggested 
by the corresponding XCorr result shown in Fig. |^a)). 
Although, these parameters are not important for distance 
estimation, they are - nevertheless - legitimate sources of 
erroneous detection. Struct S-XCorr, on the other hand, does 
not recover multipath/low-noise peaks apart from the LoS 
path (Fig. |3(c)| >; and therefore, alleviates such anomalies. 

Analysis: space and time complexity. The functionality 
algorithm {XCorr vs. compression) on the receiver is the vital 
point of difference. The running time of XCorr is 0{in?) in 
the time domain (TD-XCorr) and 0(n log n) in the frequency 
domain (FD-XCorr) on conventional receiver systems. How¬ 
ever, for WSN nodes, additional signal processing platforms 
have to be added to compensate for the lack of hardware di¬ 
vide or floating point support units. Therefore, we propose 
an alternate data compression functionality that has a simi¬ 
lar time complexity (mn ^ 0(n log n)), but a much smaller 
space complexity (competent with the mote constraints). 

We compared their performance on the experimental 
system, for which we performed the same ranging process 
but configured the receiver to record for 0.1s (i.e., 4800 
acquired samples). Table shows the individual running 
time of the TD-XCorr, FD-XCorr and compression for dif¬ 
ferent compression factors a. We note that FD-XCorr is 
~ 30 times faster than TD-XCorr as expected from their 
asymptotic results. However, the compression time (shown 
as 'Compression 1-Buf') varies for different a, and is slower 
than FD-XCorr for all except a = 0.05. 

We overcome this drawback by using the simple idea 
of buffer-by-buffer compression rather than one-step com¬ 
pression. This method divides the acquired signal vector 
X of length ria across b buffers of equal sizes, compresses 
the information in each buffer, and finally, assembles the 
measurements in their correct order. The signal in each 
buffer X is of length h, where h = ria/b. The random 
sensing matrix T> for compressing the data in each buffer 
is of size [m x n], where m = a h = a (ria/b) = m/b. 
The resultant measurement vector y (for each buffer) is of 
length of m. The number of iterations required to process 
each buffer is (mn). Therefore, the compression time for b 
buffers take {bmn) = {mria/b) iterations. This improvement 
can be identified in Table (shown as 'Compression 10- 
Buf'), where we divide the 4800 samples across 10 buffers 
and record their individual compression time for different 
a. The results show a worst-case to best-case improvement 
of 6 X to 60 X over FD-XCorr. As resource constrained WSN 
motes do not support floating point operation, our proposed 
method is expected to yield better performance (shown in 
Section]^ on such platforms than on a standard PC. 

Analysis: signal detection and post-processing. The pro¬ 
cess of detection is not without errors as the reconstructed 
coefficients s may have been wrongly approximated due to 
measurement noise that contributes to higher coefficient val¬ 
ues at incorrect locations. To overcome these inaccuracies, 

5. The generation of the dictionary coefficients and cross-correlation 
peaks are in the negative lag part since we have reversed the order of 
operation, wherein the reference signal was operated with the acquired 
signal. 




TABLE 1: Time Complexity Analysis 
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a 

TD-XCorr 

(s) 

FD-XCorr 

(s) 

Comp. 
1-Buffer (s) 

Comp. 

10-Buffers (s) 

0.05 

0.1932 

0.0062 

0.0042 

0.0001 

0.10 

0.1932 

0.0062 

0.0077 

0.0003 

0.30 

0.1932 

0.0062 

0.0218 

0.0006 

0.50 

0.1932 

0.0062 

0.0361 

0.0010 


we use the same principle of buffer-by-buffer reconstruction 
at the BS as well, which not only provides an additional clue 
for correct detection, but also, serves as a guideline to choose 
the buffer count b. 

The number of buffers b is chosen such that the number 
of samples in each buffer is the same as the sample count 
of the reference signal p, i.e., n = tpEg. For example, if 
p contains 100 samples and x consists of 1000 samples, 
then 6 is 10. There are two benefits in making this choice. 
First, it restricts the direct path signal (in the total acquired 
trace) to be spread across a maximum of 2 buffers, and 
so, guarantees that the magnitude of the corresponding 
recovered coefficient would always remain at least 50% 
above its original estimate. Increasing b beyond 2 buffers 
decreases the individual peak heights to smaller magnitudes 
that poses a difficult detection task to differentiate them 
from the noise-floor. Second, it provides easy processing at 
the BS, where the operation of right zero-padding p to make 
its dimensions equal to x is substituted by fragmenting x 
into b buffers to match the size of p (Section [2.3.2[ ». 

The reconstruction process is performed on all b buffers, 
which is followed by the signal detection and range estima¬ 
tion algorithm. 

• Phase-1: It identifies the various correlation domain coef¬ 
ficient peaks and selects the first tallest peak in each of the 
b buffers that is at least 6 standard deviations above the 
mean. The detection is considered to have failed for those 
buffers where no point qualifies as a peak. This reduces the 
validation space for phase -2 to 5 (< b) buffers. 

• Phase-2: If there are valid peaks in more than one buffer 
(i.e., b > 1), then the tallest peak (across all b buffers) among 
them is selected as the ranging peak. The detection is correct, 
if this peak in buffer bi has a lag that is: 

• Positive: ^ The peak in the previous buffer 
must have a negative lag. 

• Negative: ^ The peak in the next buffer 6^+1 must 
have a positive lag. 

This relationship is a result of the manner in which the 
signal gets aligned in different buffers and its equivalent 
representation in the correlation domain/cross-correlation 

(Fig.§. 

If 5 = 1 (i.e., only a single buffer has a valid peak), then the 
peak identified in phase-1 qualifies as the ranging peak. The 
estimated range r is obtained as: 

r = {{fibi-i + l)/Fs) X Vs (15) 

where bi-i is the buffer count before the detection buffer, I 
is the lag (in samples) of the ranging peak in the detection 
buffer, and Vg is the temperature compensated speed of 
sound in air. 


XCorr in Buffers: [Buf-1] 
480 samples with 480 sampies 


XCorr in Buffers [Buf-2] 

480 Samples with 480 Samples 
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(b) 
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Fig. 5: Buffer-by-Buffer Processing. The detection accuracy 
of StructS-XCorr, in regards to the position of the LoS peak and 
the tallest peak in each buffer, is at par with XCorr. 


2.4.2 Characterization Studies and Benchmarks 

Ranging error vs. {compression factor, SNR}. The optimal 
choice of the compression factor a that achieves the best 
accuracy with the least measurements (or projections) m 
is a key design decision as a smaller m leads to lower 
storage and transmission cost, a depends on the sparsity 
k (Eq. of the received signal in the correlation domain, 
which in turn depends on the received SNR that varies with 
transmission power and ranging distance. In this subsection, 
we empirically study the relationship between SNR and a. 
The study was conducted in the following environments. 

• Case-A {outdoor, very low multipath}: A less frequently 
used urban walkway, and the weather being sunny with 
occasional mild breeze. 

• Case-B [indoor, low multipath}: A quiet lecture theatre 
([25 X 15 X 10] m) with a spacious podium at one end of 
the large room. 

• Case-C [indoor, high multipath}: A quiet meeting room 
([7 X 6 X 6 ] m) with a big wooden table in the center and 
other office furnitures. 

The transmitter and the receiver were fixed at a constant 
separation distance of 5 m. The transmit power was varied 
such that the received SNR were recorded within the limits: 
[0-5) dB, [5-10) dB, [10-20) dB, [20-30) dB. For reasons that 
will be explained in the next subsection, we slightly modi¬ 
fied the peak selection criteria of the detection algorithm to 
choose the tallest peak if there was no valid peak (6 standard 
deviation above the mean). 100 observations were collected 
for every experiment. We show the relative mean error and its 
deviation with respect to the (best-case) XCorr in all the results in 
this segment. 

Fig. I^a), Fig. [^b) and Fig. |^c) shows the dependence of 
^-compression and its recovery accuracy on the SNR of the 
ranging signal using S-XCorr. Across all figures, we observe 
that applying a higher a on a lower SNR signal results in an 
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Fig. 6: S-XCorr. Characterization of compression factor a with SNR. 
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7: StructS-XCorr. Characterization of compression factor a with SNR. 
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Buffer-by-Buffer vs. Single Buffer Detection Method 
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(a) Improvement: 
Order of mag. 1 


(b) Improvement: 
Order of mag. 1.5 


(c) Improvement: 
Order of mag. 4 


Fig. 8: StructS-XCorr: Buffer-by-Buffer vs. Single Buffer Detection. For a compression factor of 0.30, the buffer-by-buffer detection 
shows an order of magnitude 1-4 improvement over single buffer detection method. 


increase in estimation error. Fig.|^a) for Case-A presents the 
most clear characterization by negating the effect of channel 
multipaths (though introducing an increased background 
noise level), where observations with a high SNR of [20- 
30) dB provide reliable range estimates by using only 15% 
projections while those having low SNR of [0-5) dB show 
confident result only with a = 0.30 (i.e., using more pro¬ 
jections). Fig. I^b) and Fig. [^c) show the results for Case- 
B and Case-C. Due to a less dominant multipath profile 
and background noise in Case-B, the accuracy levels show 
high confidence for a > 0.20. The situation is challenging 


in Case-C (due to high multipath), and so, the errors are 
as large as Im with a = 0.05, but attain stability after 
a = 0.25. The cumulative probability results suggest that 
there is a 95% probability of incurring an additional error 
of < 1.5 cm in indoors and < 3 cm in outdoors with 
a = 0.30 with respect to its XCorr estimate. Using a > 0.30 
does not improve the accuracy significantly considering the 
additional overheads. Fig.j^also shows that for applications 
that require lower accuracy (e.g., 100 cm), a as less as 0.05 
is sufficient. We also performed ranging experiments with 
changes in distance over 1-10 m. Although, smaller values 
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Domain and Algorithm Comparison 
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(b) Improvement: 
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Fig. 9: Domain and Algorithm Comparison. For a compression(CP)/downsampling(DS) factor of 0.30, StructS-XCorr model in the 
correlation domain shows an order of magnitude 1-4 higher detection accuracy compared to: (i) compression with StructS-Xcorr in the 
DCT domain (ii) downsampling with XCorr (in) downsampling with StructS-XCorr. 


of a (i.e., lesser projections) were good for high SNR levels, 
the results with a = 0.30 were optimal, even in the worst 
case to obtain higher accuracy (< 2 cm). 

Fig.|^shows the above analysis, but using StructS-XCorr. 
It revelas two interesting observations. First, similar to Fig. [6 
we observe that applying a higher a on a lower SNR signa 
results in an increase in estimation error. Second and more 
imporantly, there is a 40% improvement in ranging accuracy 
for cases of high compression factor and low SNR in Case-A 
and Case-C. However, there isn't appreciable improvement 
in Case-B (an environment with low noise). 

Fig. I compares the detection accuracy between our 
proposed buffer-by-buffer method versus processing all the 
samples in a single buffer using the StructS-XCorr recovery 
method. From reasons explained in Section [2.4.1 the results 
show at least 1 order of magnitude improvement. 

The sparse representation in the proposed correlation 
domain shows significantly better accuracy of an order of 
magnitude 2 (Fig. 0 compared to the DCT domain (for 
a=0.30) due to the most sparse depiction of the ranging 
signal (Fig. j^. For DCT domain processing, the recovered 
coefficients si were multiplied with the DCT basis (Eq.j^ 
to obtain an estimate of the received signal xi, and then 
cross-correlated with the reference signal p. Here also, the 
recovery mechanism is based on StructS-XCorr. 

Another simple (but deterministic) method of reducing 
the sample count is to downsample x by a factor resulting 
in y. We verify its detection accuracy in the correlation 
domain by using two different algorithms: (a) standard 
cross-correlation and (b) StructS-XCorr with the following 
formulation of the sparse approximation problem: 


(4): §1 = min ||s||<-i subject to:| |4''s - y| I 2 < e (16) 

The comparison results in Fig. [^ show that neither of 
these two methods based on downsampling provide better 
estimates than the proposed method of ^^-minimization 
and structured sparsity in the correlation domain where 
the improvement is of an order of magnitude 2 across 
all experimental environments. Information embedding in 
random ensembles preserves the ^^-norm (or energy) of its 
respective higher dimension representation, and therefore, 
the recovery accuracy is significantly better than deter¬ 


ministically choosing samples and discarding information 
(i.e., frequency components) by downsampling. This result, 
therefore, supports the theoretical underpinning that there 
is an overwhelming probability of correct recovery via 
minimization for dimensionality reduction by random lin¬ 
ear projection (Section |2.2| . Since the recovery techniques 
are based in II-minimization, we direct the readers to p0| 
for a systematic benchmark of their performance. 

Adaptive estimation: compression factor. The design of an 
adaptive mechanism for a requires estimating the received 
SNR. We propose two different approaches: first, with a BS 
feedback to receiver, and second, on the receiver itself. 

For the BS-feedback mechanism, we utilize empirical 
information from the peak detection algorithm. In Sec¬ 
tion |2.4.1| we considered the scenarios where the valid 
buffer count b > 1. If a valid peak (i.e., at least 6 standard 
deviations above the mean) is not detected in any buffer 
(i.e., b = 0), then the detection is considered to have failed. 
This implies that the recovered coefficients are noisy due to 
a non-optimal a for the respective measurements (character¬ 
ized by its SNR). It was precisely the reason for modifying 
the peak selection criteria in the previous subsection, where 
we observed large errors in peak positions for magnitudes 
below the specified threshold. The BS-feedback algorithm 
starts with the initial knowledge of whether a valid peak 
was determined with a = 0.30. If the detection succeeds, 
then a is decremented by a step size of 0.05 and compressed. 
This process is iterated until the detection fails, in which 
case, the previous a values is selected. On the other hand, 
if no valid peaks were encountered for the starting case, 
a is incremented in steps of 0.05 and the entire process is 
repeated until the detection succeeds. 

A major drawback of the feedback approach is the ad¬ 
ditional measurements (that translate to transmission over¬ 
head), and its associated delay and power usage for deriving 
a. Therefore, we introduce this functionality on the receiver 
by a simple power estimation algorithm. The ratio p of the 
peak signal amplitude to the average of the absolute values 
in the sampled signal is calculated, and a corresponding 
a is selected according to the following empirically chosen 
criteria, a = {{0.05 : p > 30}, {0.10 : 20 < p < 30}, {0.10 : 
20 < p < 30}, {0.20 : 15 < p < 20}, {0.30 : 10 < p < 
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TABLE 2: Projections vs. Accuracy. A positive value indi¬ 
cates higher projections or reconstruction error compared to the 
threshold a = 0.30 


Scenario 

Projections (%) 

Accuracy (%) 

BS-Feedback 

101.16 

-1.75 

Receiver 

-17.55 

5.26 


15}, {0.50 : 05 < p < 10}, {1.00 : p < 05}}. 

For our analysis, we randomly selected 1000 measure¬ 
ments pertaining to different SNR levels in the indoor 
lecture theatre (Case-B). The respective a was estimated 
using the above two methods and their performance was 
compared against our empirically selected threshold value 
of a = 0.30. Table reports their performance trade-off 
where the BS-feedback obtains high accuracy but requires 

2 times more measurements, while the receiver estimation 
approach takes fewer measurements and obtains only a 5% 
worse accuracy. 

3 Evaluation 

We present the design and implementation of an end-to-end 
acoustic ranging system using constrained WSN platforms 
in this section, followed by evaluation results. Fast data 
acquisition and compression on the receiver node was the 
underlying system rationale; hence, all design decisions 
were guided towards maximum RAM utilization rather 
than external flash (that would introduce additional la¬ 
tency). 

3.1 System design on constrained piatforms. 

The system comprised of the Tmoteinvent (as listener), our 
designed sensor mote (as beacon) and a network interface 
to the base-station (Fig.p^. 

Transmitter. The beacon node (31) comprised of our WSN 
platform along with a custom designed audio daughter 
board that included four TI TLV320AIC3254 audio codecs 
and the Bluetechnix CM-BF537E digital signal processor 
module. The transmitting front-end of the beacon mote con¬ 
sisted of a power amplifier driving a tweeter (speaker) trans¬ 
ducer (VIFA 3/4" tweeter module MICRO). The tweeter 
(size: [2 X 2 X 1] cm) had a fairly uniform and high frequency 



.4 



response of ~ 22 dB above the noise-floor between 1-lOkHz. 

Receiver. Tmoteinvent (32) was used as the listener node, 
due to its low-cost and low-power (100 times more power 
efficient than the DSP on the transmitter) features that are 
expected from a WSN platform. The receiving front-end 
consisted of an omni-directional electret microphone (Pana¬ 
sonic WM-61B) attached to an Analog Devices SSM2167 
preamplifier. It allows omni-directional acquisition in the 
range 20 Hz - 10 kHz, and has a near-flat frequency response 
between 3-7 kHz that is 10 dB above the noise floor. High- 
rate audio data collection was achieved using the DMA 
controller packaged with the MSP430 MCU. However, the 
MSP430 DMA causes truncation of the 12 bits ADC data to 
8 bits rather than to two bytes, and so, results in a data 
resolution loss of 4 bits. 

Ranging/Detection methodology. The system uses the V- 
TDOA of RF and acoustic signals to measure the beacon-to- 
listener distance. The beacon initiates the ranging process by 
periodically transmitting a RF signal followed by a acoustic 
pulse after a fixed time interval. The fast propagating RF 
pulse reaches the listener almost instantaneously and syn¬ 
chronizes the clocks on both the devices, following which, 
the TDOA is measured after the arrival of the acoustic pulse. 
The ranging signal was a linear chirp of [3-7] kHz/0.01 ms 
and was transmitted at an acoustic pressure level of 70 dB. 
The DAC on the audio codec of the beacon node was 
programmed to sample at 48 kHz, while the ADC on the 
receiver Tmote was configured to acquire at 15 kHz. 

If the time taken for sound to travel a maximum range 
dc at a speed Vs is at most and if the transmitted chirp 
length is tp, then the signal must reach the receiver within 
For tp = 0.01s and dc ~ 10m, the recording 
of the signal must be completed by 0.03 s. We include an 
additional 0.01 s to compensate for reverberation time (tc), 
and setup the recording time to 0.04 s (Eq. [^. Following 
the buffer-by-buffer compression method, the signal was 
spread across 5 buffers. A measurement matrix was stored 
in the RAM that contained i.i.d. entries sampled from a 
symmetric Bernoulli distribution (Eq. [^. We postponed 
the multiplication operation on the matrix entities with the 
constant (1 / y/m) until the recovery stage at the BS. 

The listener acquires the audio samples, compresses and 
stores these measurements in the RAM over a period of 5 
iterations, and then, transfers them to the BS. These mea¬ 
surements are again divided into their respective buffers 
and reconstructed to obtain the coefficients. The detection 
process is the same as explained in Section [2.4.1 however 
we made two minor modifications. First, due to a higher 
receiver noise floor, we set the criteria for selecting the 
first tallest peak to 3 (instead of 6) standard deviations 
above the mean. Second, as each sample corresponds to 
2.2 cm of distance (at a sampling rate of 15 kHz), we used 
a simple parabolic interpolation method (^ to obtain finer 
resolution. This additional step identifies the position of the 
first neighboring peak on the left and right of the selected 
ranging peak, finds the parabola that passes through these 
points, and calculates the time coordinate of the maximum 
of this parabola that estimates the range. 


Fig. 10: System architecture. End-to-end acoustic ranging 
system using constrained WSN platforms. 
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Fig. 11: End-to-End acoustic ranging system using constrained WSN platforms. Ranging results. 


3.2 Performance Results 

Ranging. The ranging experiments were performed in the 
same three environments as mentioned in Section [2.4.2 (a) 


Node Placement and Localization Resuits 
Outdoor (Case-A) 


Case-A: outdoor walkway, (b) Case-B: lecture theatre, and 
(c) Case-C: meeting room. In all the setups, the listener 
node was fixed while the beacon node was moved along the 
direct LOS in a controlled manner. The correct ground truth 
was established using a measuring tape and markers. 30 
observations were collected for every experiment. While the 
ranging performance of S-XCorr in the absolute sense was 
studied in our previous work ||^, here we study the relative 
performance of S-XCorr and StructS-XCorr with respect to 
the (best-case) X-Corr in terms of the relative mean error 
and its deviation (Fi g. [IT) . 

As shown in Fig. |ll[a), Case-A recorded the best result 
where the cumulative error (mean + deviation) of StructS- 
XCorr with respect to S-XCorr was approximately 1 cm for 
distance between 1-4 m; but improved by more than 2 cm 
for ranges beyond that. Measurements after 5 m were com¬ 
pressed with an a = 0.40. The audio recordings after 8 m 
were highly noisy, and therefore, required an even higher 
a value for compression to compensate for the reduced 
sparsity levels. However, due to non-availability of RAM 
memory space for storing the additional entries of the 
new measurement matrix 4>, range estimates beyond 8 m 
could not be processed. Due to the decrease in the sparsity 
levels with lower SNR, the measurements from [6 — 8] m 
were compressed with a higher a of 0.40. Fig. [^ (b) and 
Fig. [^ (c) show the results for Case-A and Case-C, and it 
suggests that StructS-XCorr did not significantly outperform 
S-XCorr. Fig. [TT] supports the same observation that was 
noted in Fig. jSjand Fig. [^ where StructS-XCorr is beneficial 
in conditions of low SNR (typical of Case-A). Localization. 

In these experiments, 5 listener nodes were placed at fixed 
(known) locations in an outdoor setting (Case-A) to obtain 
the (unknown) location coordinates of the beacon node. This 
layout was kept consistent with the approach presented in 
p^ . The speaker had a fairly uniform signal strength within 
the directionality cone of ± 45*^ (with a 2 dB decrease from 
Qo _ 450)^ therefore, all the 5 listeners were confined within 
this perimeter with their microphones facing the speaker. 

The beacon initiated the ranging process and the cor- 
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Fig. 12: End-to-End acoustic ranging system using con¬ 
strained WSN platforms. Localization results for Case-A. 


responding acoustic chirp was recorded by the 5 listeners. 
A simple time division multiple access (TDMA) approach 
was followed for orderly data transfer wherein each listener 
transferred the compressed data in a preset time slot. The 
distances between the beacon and the receivers were esti¬ 
mated at the BS, which was followed by the linear least 
square localization algorithm (31) to calculate the 2D loca¬ 
tion of the beacon node. Fig. [T^ shows the node placements, 
where the listener and beacon node(s) have been depicted 
as circle and square respectively along with the estimated 
beacon location using the two methods. The cumulative 
(mean + deviation) localization error between S-XCorr and 
StructS-XCorr was less than 1 cm. As the localization error 
is upper bounded by its ranging errors, we expect similar 
relative performance in Case-B and Case-C that show a 
maximum ranging error difference of 2 cm. 

Energy Consumption. Table [^ reports the time and energy 
consumed for each operational step on the listener node. 
The cumulative time spent in compression and radio trans¬ 
fer is ~ 0.0640 s, which is more than 100 times faster than 
performing time-domain cross-correlation on the node itself. 
Its equivalent frequency-domain cross-correlation requires 
2'^FFT and IFFT operation steps. This translates to significant 
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times slower than our technique). These statistics suggest 
that although compression by random ensembles is not 
the best compression method, it benefits of greater energy 
savings along with faster data processing is a good trade¬ 
off between compression and computation time, accuracy 
(in case of data loss), energy consumption. For example, if 
applications can tolerate 100 cm localization accuracy, the 
proposed method requires approximately 5% of measure¬ 
ments only (Fig.[^. Furthermore, in the event of packet loss, 
S-LZW needs to either retransmit the entire compressed data 
segment, or employ expensive end-to-end reliable commu¬ 
nication protocol. On the other hand, the performance of the 
proposed protocol degrades gracefully with packet losses as 
it can still recover the ranging information, but with larger 
errors (Fig.0. 


Fig. 13: Estimation of Battery Longetivity. Compared to the 
method of executing the standalone cross-correlation algorithm 
on the receiver platform, the combination of compression (by 
random ensembles) followed by transmission to the base-station 
will extend the battery lifetime to multiple years. 

energy saving on the receiver device, wherein a typical AA 
or AAA battery can last for over a decade (Fig. [^. When 
optimized for speed, a FFT over 512 sample window of an 
8 kHz signal takes 0.5 s execution time on TelosB p3) , and 
so, for our case of 750 samples would take « 2.2 s, which is 
still 34 times slower. 

With respect to compression performance, the popular 
LZ77-based algorithm 'gzip' achieves slightly better com¬ 
pressibility of Of = 0.27 (Table [^. However, due to its 
lossless nature of compression, it is not robust to infor¬ 
mation loss (packet drops) that are common in low-power 
sensor networks. In contrast, the performance degradation 
by our approach is less severe and has the same effect 
as compressing with a smaller a (Fig. [^. A similar, but 
energy efficient algorithm proposed by Sadler et al. p4) : S- 
LZW reports an execution time of approximately 0.05 s for 
528 bytes of data, and therefore, its equivalent compressing 
cost for 750 bytes would be approximately 0.075 s (^ 12 


TABLE 3: Performance Analysis: Tmoteinvent 


Operation 

Time (s) 

Energy (mj) 

Audio Acquisition 

00.0665 

0020.50 

Compression 

00.0060 

0001.85 

Radio Transfer (Compressed Data) 

00.0580 

0017.88 

Cross-correlation (Time-domain) 

15.6250 

4816.00 


TABLE 4: Compression Factor (n) for LZ77-based Com¬ 
pression Algo rith m 'gzip'. Dataset collected by the POC 
System (Section^^^. 


Scenario Mean a Deviation a 


Case-A: Very-low Multipath 

0.27 

0.005 

Case-B: Low Multipath 

0.27 

0.005 

Case-C: High Multipath 

0.28 

0.009 


4 Related Work 

We broadly categorize the related work based on the detec¬ 
tion mechanism used in existing acoustic, ultrasound and 
RF localization systems in WSN. 

Non Cross-correlation: Active Bat j^. Cricket Q, Medusa 
p6) and SpiderBat (3^ are ultrasound positioning sys¬ 
tems. Range measurements are performed by calculating 
the TDOA between two synchronously sent RF and ul¬ 
trasonic pulses at the receiver. The ranging pulse is a 
single frequency (40 kHz) sinusoidal and its arrival is de¬ 
tected by triggering an interrupt pin of the microcontroller 
when its leading edge exceeds a preset threshold. Due to 
the functional simplicity, low-power microcontrollers (At- 
mega/MSP430 series) used in these platforms are efficient 
in managing the on-board processing. Kusy et al. in p8| in¬ 
troduced radio interferometry to design a low-cost RF-based 
positioning system on the Mica2 platform. This method 
measures the relative phase offsets of the interference field 
(created by two nodes transmitting RF pulses at slightly 
different frequencies) at different locations to obtain the 
position estimate of the transmitters. However, these tech¬ 
niques are not robust against multipath characteristics, and 
so, no results have been published for complex cluttered 
environments. 

Cross-correlation: The system proposed by Kushwaha et 
al. in j^, Hazas et al. in (4|, AENSBox 0, BeepBeep 0 
and TWEET are existing acoustic broadband systems. 
Despite their difference in signal design, synchronization 
schemes and methods to improve the received SNR, they 
share a common detection mechanisms:cross-correlation. 
These systems have been reported to withstand consider¬ 
able channel multipath and environmental noise, and so, 
benefit in providing reliable and precise distance estimates 
for long coverage range. However, the capability of these 
systems have been upgraded by using DSP/smart phones 
that typically consume higher power and resources. 

The theory of sparse representation Q helps to effi¬ 
ciently embed information without much loss (which serves 
the purpose of storage and transmission) followed by its 
recovery from an underdetermined system. Although, we 
follow a similar approach as Wright et al. in face recog¬ 
nition, the scope of our problem is completely different. We 
design a new dictionary, specifically, for cross-correlation 
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based detection and ranging, as opposed to feature extrac¬ 
tion for face classification. 

Previous work by Whitehouse et al. in and Sallai 
et at. in pO) on acoustic ranging in resource constrained 
sensor networks (using MICA platform) categorically state 
that the limited availability of RAM was the most serious 
constraint in their system implementation. The ranging 
results reported by have an average error of 8.18 cm 
over a distance of 1-9 m by repeating the ranging signal 
16 times, which results in significant runtime and energy 
overhead. Using StructS-XCorr, our acoustic ranging system 
was able to confront this problem, and also, was able to 
provide similar performance (mean error of < 10 cm over 
1-10 m) with fewer samples. 

StructS-XCorr approach has several merits. First, it pro¬ 
vides a simple dimensionality reduction mechanism (that 
can be implementable on a typical WSN node) as a vi¬ 
able alternative to the computationally intensive cross¬ 
correlation function. Second, it requires processing a sig¬ 
nificantly smaller datasets (proportional to the logarithmic 
count of the acquired signal samples) to obtain accuracies 
comparable to cross-correlation (the state-of-the-art detec¬ 
tion technique). At the local device end, the simplicity 
of this operation translates into appreciable resouce sav¬ 
ings. Finally, it is independent of the physical signal (ra¬ 
dio/acoustic) and medium (air/water), and is therefore a 
versatile framework for wide range of uses. However, the 
centralized processing framework is the primary drawback. 
Towards this end, we argue that it is a reasonable trade-off 
for achieving the performance of cross-correlation on mote- 
class devices. We envision that, besides having an impact 
on current location sensing systems, it would create a new 
drive for WSN applications where the requirement 
for reliable location information on constrained network 
embedded sensing elements hold more importance than 
centralized computation. 

5 Conclusion and Discussion 

We presented a new information processing approach for 
range estimation: cross-correlation via sparse representation 
and structured sparsity. We showed that exploiting struc¬ 
ture of sparsity is critical for high-performance signal pro¬ 
cessing operations of high-dimensional data such as cross¬ 
correlation. The sparsity of the underlying signal in our pro¬ 
posed correlation domain aids in the recovery mechanism 
to obtain reliable range estimates. The main idea was to 
use a Hankel matrix with the time-shifted reference signal 
as the dictionary that leads to sparser representation than 
processing in other domains. The design of the correlation 
dictionary and information recovery using structured spar¬ 
sity are the main contributions of this work, which allows 
for ToA estimation with or without compressed sensing. 
We designed its theoretical framework and validated its 
working through empirical system tests and characteriza¬ 
tion studies. Considering the implementation simplicity in 
the acoustic domain, we developed an end-to-end acoustic 
ranging system using COTS sensor platforms to verify our 
hypothesis. 

Our work in this paper is guided by the current hard¬ 
ware limitations of low-cost and low-power sensor plat¬ 
forms. We believe that the key observations and principles 


derived here will find their application (3^ , (40) in location 
sensing systems that have constrained hardware resources 
to handle the bulk of data processing. 
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